Read data

head(oasis_dice)
##   ...1 participant_id session_id dice_bet_probability darq_probability
## 1    0 sub-OASIS10002    ses-M00            0.8926427     0.8916364312
## 2    1 sub-OASIS10010    ses-M00            0.8117313     0.0001414413
## 3    2 sub-OASIS10011    ses-M00            0.9305678     0.9999741316
## 4    3 sub-OASIS10037    ses-M00            0.9230336     0.6704366803
## 5    4 sub-OASIS10042    ses-M00            0.8931396     0.9995048046
## 6    5 sub-OASIS10045    ses-M00            0.9522836     0.9449698329
##   darq_pass dice_hd_bet_probability mutual_info correlation_ratio   norm_mi
## 1      TRUE               0.9095625   0.8533593         0.7847766 0.2011821
## 2     FALSE               0.8299690   0.7254140         0.6766012 0.1741843
## 3      TRUE               0.9405499   0.8728209         0.8159280 0.2073242
## 4      TRUE               0.9128021   0.7606033         0.7152034 0.1821181
## 5      TRUE               0.9225100   0.7154123         0.7272557 0.1761768
## 6      TRUE               0.9409483   0.7981154         0.7787458 0.1938449
##   correlation_coef     cr_l1
## 1        0.6140392 0.6578466
## 2        0.4409037 0.5621014
## 3        0.6240867 0.6902325
## 4        0.5447207 0.6028452
## 5        0.6027540 0.6089401
## 6        0.6402254 0.6620776

Scatter plot

The scatter plots illustrate the relationship between each metric and DARQ probability.

dice BET

The correlation between the two metrics is 0.6859204

dice HD-BET

The correlation between the two metrics is 0.7552805

Mutual information

The correlation between the two metrics is 0.5150382

Correlation ratio

The correlation between the two metrics is 0.7392367

Norm mi

The correlation between the two metrics is 0.6330968

Correlation coefficient

The correlation between the two metrics is 0.7570659

L1-norm correlation coefficient

The correlation between the two metrics is 0.7549297

Correlation plot

This figure shows how these methods are correlated to one another. In general, everything is positively correlated meaning that they are capturing similar information. The darker squares, however, illustrates the metrics that cluster together based on their similarity.

corrplot::corrplot(cor(oasis_dice_num), method = "shade", order = "hclust", hclust.method = "ward.D2")

Explore

To explore the relationship and compare between the metrics, we first look at their distributions, then perform a principal component analysis (PCA) on all metrics.

Density plot

The distribution showed that DARQ has very different distribution as compared to other metrics as it tends to give extreme scores.

oasis_dice_num %>% pivot_longer(colnames(oasis_dice_num)) %>%
  ggplot(aes(x = value)) + geom_histogram(bins = 25) + facet_wrap(~name)

PCA

We performed a PCA on all metrics aside from DARQ to examine the relationships between them. Because these metrics are all between 0 and 1, we performed to run a non-centered-non-scaled PCA to keep the variance and the values of the metrics. This PCA is equivalent to performing a singular value decomposition directly on the data.

The row factor scores (participants) showed a separation roughly matching the pass/fail outcomes from DARQ. In the figure, GREEN dots indicate passing participants, and RED dots indicate failing ones.

## Coordinate system already present. Adding new coordinate system, which will
## replace the existing one.

From the column factor scores (metrics), we identified three clusters of metrics. To narrow it down to few metrics, we chose dice_hd_bet_probability which we think is the most meaningful to include, and correlation_coef which is less related to the other metrics (as indicated by large angle between all metrics and itself).

Grouping with the two metrics

From dice_hd_bet_probability and correlation_coef, we examined their relationships and try to derive different QC groups.

Distributions

We first checked the distribution of the two metrics and used these distribution to identify a cut-off point. For DICE_HD_BET, we set it to 0.89, and for correlation coefficient, we set it to 0.5 (indicated by the red lines).

FinMetric <- oasis_dice %>% select(dice_hd_bet_probability, correlation_coef) %>% as.data.frame
rownames(FinMetric) <- oasis_dice$participant_id

FinMetric %>% ggplot(aes(dice_hd_bet_probability)) + geom_histogram(bins = 30) + geom_vline(xintercept = 0.89, color = "red", lwd = 2)
## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## i Please use `linewidth` instead.

FinMetric %>% ggplot(aes(correlation_coef)) + geom_histogram(bins = 25) + geom_vline(xintercept = 0.5, color = "red", lwd = 2)

scatter plot

We plotted the scatter plot between the two current metrics with their newly-derived pass/fail outcomes.

The correlation between the two metrics is 0.8652564

  • The group labels are in (Pass/Fail_corr_coef)_(Pass/Fail_dice_hd_bet)

Participants in 4 groups

Pass Both
FinMetric %>% filter(both_group == "TRUE_TRUE") %>% rownames
##  [1] "sub-OASIS10002" "sub-OASIS10011" "sub-OASIS10037" "sub-OASIS10042"
##  [5] "sub-OASIS10045" "sub-OASIS10051" "sub-OASIS10060" "sub-OASIS10062"
##  [9] "sub-OASIS10063" "sub-OASIS10071" "sub-OASIS10081" "sub-OASIS10103"
## [13] "sub-OASIS10112" "sub-OASIS10117" "sub-OASIS10126" "sub-OASIS10136"
## [17] "sub-OASIS10142" "sub-OASIS10150" "sub-OASIS10151" "sub-OASIS10156"
## [21] "sub-OASIS10167" "sub-OASIS10183" "sub-OASIS10184" "sub-OASIS10185"
## [25] "sub-OASIS10195" "sub-OASIS10207" "sub-OASIS10218" "sub-OASIS10260"
## [29] "sub-OASIS10261" "sub-OASIS10266" "sub-OASIS10280" "sub-OASIS10287"
## [33] "sub-OASIS10294" "sub-OASIS10303" "sub-OASIS10309" "sub-OASIS10310"
## [37] "sub-OASIS10316" "sub-OASIS10318" "sub-OASIS10323" "sub-OASIS10329"
## [41] "sub-OASIS10333" "sub-OASIS10342" "sub-OASIS10348" "sub-OASIS10357"
## [45] "sub-OASIS10363" "sub-OASIS10380" "sub-OASIS10388" "sub-OASIS10396"
## [49] "sub-OASIS10406" "sub-OASIS10413" "sub-OASIS10415" "sub-OASIS10420"
## [53] "sub-OASIS10423" "sub-OASIS10430" "sub-OASIS10432" "sub-OASIS10435"
## [57] "sub-OASIS10439" "sub-OASIS10446"
Fail Both
FinMetric %>% filter(both_group == "FALSE_FALSE") %>% rownames
##  [1] "sub-OASIS10010" "sub-OASIS10110" "sub-OASIS10162" "sub-OASIS10179"
##  [5] "sub-OASIS10199" "sub-OASIS10222" "sub-OASIS10223" "sub-OASIS10227"
##  [9] "sub-OASIS10373" "sub-OASIS10398" "sub-OASIS10400"
Pass Corr_coef Fail dice_hd_bet
FinMetric %>% filter(both_group == "TRUE_FALSE") %>% rownames
## [1] "sub-OASIS10114" "sub-OASIS10291" "sub-OASIS10377"
Fail Corr_coef Pass dice_hd_bet
FinMetric %>% filter(both_group == "FALSE_TRUE") %>% rownames
## [1] "sub-OASIS10084" "sub-OASIS10155" "sub-OASIS10180" "sub-OASIS10371"
## [5] "sub-OASIS10372" "sub-OASIS10402"